Bayes By Backprop Neural Networks for Dialogue Management

نویسنده

  • Christopher Tegho
چکیده

In dialogue management for statistical spoken dialogue systems, an agent learns a policy that maps a belief state to an action for the system to perform. Efficient exploration is key to successful dialogue policy estimation. Current deep reinforcement learning methods are very promising but rely on ε-greedy exploration, which is not as sample efficient as methods that use uncertainty estimates, such as Gaussian Process SARSA (GPSARSA). This thesis implements Bayes-By-Backpropagation, a method to extract uncertainty estimates from deep Q-networks (DQN). These uncertainty estimates are used to guide exploration. We show that Bayes-ByBackpropagation DQN (BBQN) achieves more efficient exploration and faster convergence to an optimal policy than ε-greedy based methods, and reaches performance comparable to the state of the art in policy optimization, namely GPSARSA, especially when evaluated on more complex domains, and without the high computational complexity of Gaussian Processes. We also implement α-divergences, variational dropout, and minimizing the negative log likelihood as other means to extract uncertainty estimates from DQN, and compare performance to BBQN and DQN. This work is carried within in the Cambridge University Engineering Department dialogue systems toolkit, CUED-pydial.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Uncertainty Estimates for Efficient Neural Network-based Dialogue Policy Optimisation

In statistical dialogue management, the dialogue manager learns a policy that maps a belief state to an action for the system to perform. Efficient exploration is key to successful policy optimisation. Current deep reinforcement learning methods are very promising but rely on ε-greedy exploration, thus subjecting the user to a random choice of action during learning. Alternative approaches such...

متن کامل

Efficient Exploration for Dialogue Policy Learning with BBQ Networks&Replay Buffer Spiking

We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems. Our agents explore via Thompson sampling, drawing Monte Carlo samples from a Bayes-by-Backprop neural network. Our algorithm learns much faster than common exploration strategies such as greedy, Boltzmann exploration, and bootstrapping-based approaches. Additional...

متن کامل

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems. Our agents explore via Thompson sampling, drawing Monte Carlo samples from a Bayes-by-Backprop neural network. Our algorithm learns much faster than common exploration strategies such as -greedy, Boltzmann, bootstrapping, and intrinsic-reward-based ones. Additiona...

متن کامل

Weight Uncertainty in Neural Networks

We introduce a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop. It regularises the weights by minimising a compression cost, known as the variational free energy or the expected lower bound on the marginal likelihood. We show that this principled kind of regularisation yields...

متن کامل

Kickback Cuts Backprop's Red-Tape: Biologically Plausible Credit Assignment in Neural Networks

Error backpropagation is an extremely effective algorithm for assigning credit in artificial neural networks. However, weight updates under Backprop depend on lengthy recursive computations and require separate output and error messages – features not shared by biological neurons, that are perhaps unnecessary. In this paper, we revisit Backprop and the credit assignment problem. We first decomp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017